Tied biases vs untied biases

For a convolution layer you can choose to have either tied or untied biases. If you’re using tied biases that means that for each location in a feature map the same bias is applied. (Currently, blocks only supports untied biases, but I made a PR that should support tied biases soon. )

Intuitively, I would say that it makes more sense to use tied biases, since the weights are also tied. However, it has been reported by Alexandre and Bart that untied biases train much faster. I can see that untied biases add extra capacity to the model, but I didn’t expect the difference to be that big.

In order to verify their results, I’ve run the same experiment as in the previous post with tied biases. The results below confirm indeed that untied biases lead to better performance for this architecture.

That untied biases perform better might also be explained by the fact that we are underfitting. In fact, if we look at the error plot in the previous post, we see that even with untied biases we are not overfitting the problem since the validation error/nll flattens but does not increase. Using tied biases might be a good idea, after all, if we increase the capacity of the model somewhere else (e.g. increasing the number of feature maps).

2 thoughts on “Tied biases vs untied biases”

Tom says:

July 24, 2015 at 9:36 am

the tied and untied labels in the graph suggest the opposite of the text…

LikeLike

1. harmdevries89 says:
  
  July 24, 2015 at 6:59 pm
  
  You are right. The labels in the graph should be swapped.
  
  LikeLike

Harm de Vries

An IFT6266 course blog

Tied biases vs untied biases

2 thoughts on “Tied biases vs untied biases”

Leave a comment Cancel reply

Share this:

Related

2 thoughts on “Tied biases vs untied biases”

Leave a comment Cancel reply